Overview

sniper.gif pubg_map.jpg
Left: A skilled sniper taking out a moving target. Right: A PUBG map.

Motivation

Battle royale games have surged in popularity in recent years. The premise of such games is as follows: players are dropped onto a fictional island and fight to be the last person (or team) standing. As they roam around the island, they loot for weapons and items crucial for their survival.

We are interested in building a prediction model for the popular battle royale game PUBG (PlayerUnknown’s Battlegrounds). In PUBG, players not only have to worry about getting killed by other players, but they also have to stay within the shrinking “safe zone,” which forces players into contact with each other. Outside of the “safe zone,” players take damage to their health at increasing rates.

Through our analysis, we aim to understand which playing strategies are more successful than others: How aggressive are the playing styles of the winners? Is it better to land in a densely or sparsely populated area? Do players who travel more place higher? The answers to these questions will be of high interest for the PUBG gaming community.

Initial Questions

First, we want to investigate how well we can predict a player’s placement based on their in-game actions. Exploring this question can then provide insight into how different playing styles compare. We would like to be able to build a model that accurately predicts a player’s game performance, but also allows us to draw inferences about whether certain playing styles are more successful.

Data

The data comes from the Kaggle competition. To download the data, join the Kaggle competition and run the shell script download_data.sh.

Note: We will need to provide a direct download link for the TA.

# Warning: Very large datasets. Read 10,000 samples first before scaling up.
raw_dat = read_csv("data/train_V2.csv.zip", n_max = 10000)
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   Id = col_character(),
##   groupId = col_character(),
##   matchId = col_character(),
##   damageDealt = col_double(),
##   longestKill = col_double(),
##   matchType = col_character(),
##   rideDistance = col_double(),
##   swimDistance = col_double(),
##   walkDistance = col_double(),
##   winPlacePerc = col_double()
## )
## See spec(...) for full column specifications.

Variables

Each row in the data contains one player’s post-game stats. A description of all data fields is provided in pubg_codebook.csv. We will focus on the solo game mode. Only about 4% of the data is in this game mode. The outcome variable we are trying to predict is win_place_perc.

# Select solo mode only
clean_dat = raw_dat %>%
  clean_names() %>%
  filter(match_type == "solo")

We are given a training set and a test set. The outcome variable for the test set will not be given to us until the end of the Kaggle competition in Jan. 30th, 2019. Therefore, for the purposes of this project, we will only be using the provided training set. Within the training set, we will create our own training and test set.

# Split into train and test set
train_ind = createDataPartition(y = clean_dat$win_place_perc, p = 0.8, list = F)
train = clean_dat %>%
  slice(train_ind)
test = clean_dat %>%
  slice(-train_ind)

Exploratory Data Analysis

head(train)
## # A tibble: 6 x 29
##   id          group_id     match_id     assists boosts damage_dealt dbn_os
##   <chr>       <chr>        <chr>          <int>  <int>        <dbl>  <int>
## 1 269c3fc4a2… 3c07be51998… ce9bc89b3ca…       0      1        100        0
## 2 73348483a5… 1c8e486a643… 85601fe44d5…       0      0         17.8      0
## 3 5fd6279839… bb19a05801d… 9e3c46f8acd…       0      0         36        0
## 4 18d002b46b… 00a3f236559… eccc44618c0…       0      1        236        0
## 5 d08ce24e7a… d57ed9de010… 1eda9747e31…       0      0          0        0
## 6 50820adcef… 212fc68772a… c9259d5a2a4…       0      9        167.       0
## # ... with 22 more variables: headshot_kills <int>, heals <int>,
## #   kill_place <int>, kill_points <int>, kills <int>, kill_streaks <int>,
## #   longest_kill <dbl>, match_duration <int>, match_type <chr>,
## #   max_place <int>, num_groups <int>, rank_points <int>, revives <int>,
## #   ride_distance <dbl>, road_kills <int>, swim_distance <dbl>,
## #   team_kills <int>, vehicle_destroys <int>, walk_distance <dbl>,
## #   weapons_acquired <int>, win_points <int>, win_place_perc <dbl>

Data Analysis (Modeling)

Narrative and Summary